Add missing lock to Constraint-aware append #7515

erimatnor · 2024-12-04T09:48:33Z

During parallel scans of Constraint-aware append, it seems like runtime chunk exclusion happens in every parallel worker. However, parallel workers don't grab a lock on a relation before calling relation_excluded_by_constraints(), which makes them hit an assertion. An AccessShareLock or higher must be held on the relation.

Ideally, runtime chunk exclusion should happen only once in the parallel worker "leader", but that requires a bigger refactor left for the future.

erimatnor · 2024-12-04T09:50:56Z

Unfortunately, the assertion failure is not easy to reproduce in a test because it doesn't happen all the time. It typically requires a clean-slate session. Might spend some more time doing a test for it later.

codecov · 2024-12-04T10:00:12Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.32%. Comparing base (59f50f2) to head (f390b0c).
Report is 744 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #7515      +/-   ##
==========================================
+ Coverage   80.06%   81.32%   +1.25%     
==========================================
  Files         190      242      +52     
  Lines       37181    44891    +7710     
  Branches     9450    11204    +1754     
==========================================
+ Hits        29770    36507    +6737     
- Misses       2997     3990     +993     
+ Partials     4414     4394      -20

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

src/nodes/constraint_aware_append/constraint_aware_append.c

erimatnor · 2024-12-10T12:05:54Z

Here's a script that reproduces the assertion crash:

select setseed(0.2);

create table readings(time timestamptz, device int, temp float);

select create_hypertable('readings', 'time', create_default_indexes => false);

insert into readings
select t, ceil(random()*10), random()*40
from generate_series('2022-06-01'::timestamptz, '2022-06-20 00:01:00', '1s') t;

alter table readings set (
      timescaledb.compress,
      timescaledb.compress_orderby = 'time',
      timescaledb.compress_segmentby = 'device'
);

--insert into readings values ('2022-06-01', 1, 1.0), ('2022-06-02', 2, 2.0), ('2022-08-02', 3, 3.0);

create index on readings (time);

select format('%I.%I', chunk_schema, chunk_name)::regclass as chunk
  from timescaledb_information.chunks
 where format('%I.%I', hypertable_schema, hypertable_name)::regclass = 'readings'::regclass
 limit 1 \gset

set timescaledb.enable_chunk_append=off;

select compress_chunk(ch) from show_chunks('readings') ch;
select decompress_chunk('_timescaledb_internal._hyper_1_3_chunk');

--set timescaledb.enable_chunk_append=on;

explain (analyze)
select time, avg(temp), device  from readings
where time  > now() - interval '2 years 5 months 20 days' group by time, device order by time;

erimatnor · 2024-12-10T13:27:47Z

@akuzm After some more investigation it seems this is related to parallel query plans, where the spawned workers don't have the relation locked. I don't know if parallel workers are expected to have relations referenced in plan nodes locked at the beginning of plan execution.

akuzm · 2024-12-10T14:42:06Z

@akuzm After some more investigation it seems this is related to parallel query plans, where the spawned workers don't have the relation locked. I don't know if parallel workers are expected to have relations referenced in plan nodes locked at the beginning of plan execution.

Not sure, e.g. the parallel seq scans seemingly doesn't lock tables in the workers, athough I only looked through the code and didn't check. In ChunkAppend, we ultimately switched to performing chunk exclusion in the parallel leader. Maybe it makes sense to do the same here as well. #5857

erimatnor · 2024-12-11T08:37:37Z

@akuzm After some more investigation it seems this is related to parallel query plans, where the spawned workers don't have the relation locked. I don't know if parallel workers are expected to have relations referenced in plan nodes locked at the beginning of plan execution.

Not sure, e.g. the parallel seq scans seemingly doesn't lock tables in the workers, athough I only looked through the code and didn't check. In ChunkAppend, we ultimately switched to performing chunk exclusion in the parallel leader. Maybe it makes sense to do the same here as well. #5857

If we can do chunk exclusion only once it is better. But TBH, I am not sure it is worth the effort given how seldom this scan node is used, so for now I think just taking this lock in the worker is enough. We can always implement leader exclusion later.

erimatnor · 2024-12-20T08:10:31Z

@akuzm Are we good to proceed with this fix. It seems the lack of locking is only in parallel queries so it seems correct to grab the lock if it is not already taken. Also seems like a very low-risk fix.

akuzm · 2024-12-23T14:07:50Z

@akuzm Are we good to proceed with this fix. It seems the lack of locking is only in parallel queries so it seems correct to grab the lock if it is not already taken. Also seems like a very low-risk fix.

Sure. Let's move the lock before constify_restrictinfos, that's a complicated function that is usually called during planning, it might have some obscure path that requires a lock as well.

erimatnor · 2025-02-10T12:40:22Z

@akuzm Are we good to proceed with this fix. It seems the lack of locking is only in parallel queries so it seems correct to grab the lock if it is not already taken. Also seems like a very low-risk fix.

Sure. Let's move the lock before constify_restrictinfos, that's a complicated function that is usually called during planning, it might have some obscure path that requires a lock as well.

Moved the lock to before constify_restrictinfos.

During parallel scans of Constraint-aware append, it seems like runtime chunk exclusion happens in every parallel worker. However, parallel workers don't grab a lock on a relation before calling relation_excluded_by_constraints(), which makes them hit an assertion. An AccessShareLock or higher must be held on the relation. Ideally, runtime chunk exclusion should happen only once in the parallel worker "leader", but that requires a bigger refactor left for the future.

erimatnor added the bug label Dec 4, 2024

akuzm reviewed Dec 4, 2024

View reviewed changes

src/nodes/constraint_aware_append/constraint_aware_append.c Outdated Show resolved Hide resolved

fabriziomello assigned erimatnor Dec 6, 2024

erimatnor requested a review from akuzm December 20, 2024 08:09

erimatnor force-pushed the fix-constraint-aware-append-missing-lock branch from 01b22dd to bb9b0f2 Compare February 10, 2025 12:39

erimatnor force-pushed the fix-constraint-aware-append-missing-lock branch from bb9b0f2 to 7015f2d Compare February 10, 2025 12:46

erimatnor force-pushed the fix-constraint-aware-append-missing-lock branch from 7015f2d to e047e43 Compare February 10, 2025 13:02

akuzm approved these changes Feb 10, 2025

View reviewed changes

Add tests

f390b0c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add missing lock to Constraint-aware append #7515

Add missing lock to Constraint-aware append #7515

erimatnor commented Dec 4, 2024 •

edited

Loading

erimatnor commented Dec 4, 2024 •

edited

Loading

codecov bot commented Dec 4, 2024 •

edited

Loading

erimatnor commented Dec 10, 2024

erimatnor commented Dec 10, 2024 •

edited

Loading

akuzm commented Dec 10, 2024

erimatnor commented Dec 11, 2024

erimatnor commented Dec 20, 2024

akuzm commented Dec 23, 2024

erimatnor commented Feb 10, 2025

Add missing lock to Constraint-aware append #7515

Are you sure you want to change the base?

Add missing lock to Constraint-aware append #7515

Conversation

erimatnor commented Dec 4, 2024 • edited Loading

erimatnor commented Dec 4, 2024 • edited Loading

codecov bot commented Dec 4, 2024 • edited Loading

Codecov Report

erimatnor commented Dec 10, 2024

erimatnor commented Dec 10, 2024 • edited Loading

akuzm commented Dec 10, 2024

erimatnor commented Dec 11, 2024

erimatnor commented Dec 20, 2024

akuzm commented Dec 23, 2024

erimatnor commented Feb 10, 2025

erimatnor commented Dec 4, 2024 •

edited

Loading

erimatnor commented Dec 4, 2024 •

edited

Loading

codecov bot commented Dec 4, 2024 •

edited

Loading

erimatnor commented Dec 10, 2024 •

edited

Loading